Fix double-close race condition when handling dead clients #16

bahrmichael · 2025-10-02T15:19:35Z

Fixes #13

Amp generated PR description:

When a WebSocket client became unresponsive and timed out, the ping timer handler caused a "handle is already closing" error by attempting to close the same TCP handle twice.

The race condition occurred in tcp.lua's ping timeout logic:

close_client(client, 1006, ...) was called, which writes a Close frame and closes the handle in its write callback
_remove_client(server, client) was called immediately after, which also closed the handle if not is_closing()

Since the write callback hadn't fired yet, the handle appeared not to be closing, so both code paths attempted to close it.

Changes made:

Ping timeout handler (tcp.lua): Removed the call to close_client() for dead clients. Now only calls _remove_client() and on_disconnect() to drop the connection without sending a Close frame. This is correct per RFC 6455, which states code 1006 MUST NOT be sent in a Close frame.
close_client() hardening (client.lua): Made fully defensive:
- Checks is_closing() before any close operation
- Never sends code 1006 in Close frame (RFC 6455 compliance)
- Fixed state transitions: sets "closing" before I/O, "closed" after
- Guards all close() calls to prevent double-close races
_remove_client() hardening (tcp.lua): Added explicit state management and a comment explaining that close() is async but marking "closed" immediately is safe since the client is removed from the active list.

Why this implementation is safe:

No code path distinguishes between "closing" and "closed" states
All state checks only care if client.state == "connected" or not
Once state is != "connected", the client won't receive new operations
The is_closing() guards prevent any double-close attempts
Marking state as "closed" immediately after close() is semantically accurate since the client is already removed from server.clients

Amp-Thread-ID: https://ampcode.com/threads/T-eb322254-0c96-4330-ae76-53b06d81f3a3

Fixes #13 When a WebSocket client became unresponsive and timed out, the ping timer handler caused a "handle is already closing" error by attempting to close the same TCP handle twice. The race condition occurred in tcp.lua's ping timeout logic: 1. close_client(client, 1006, ...) was called, which writes a Close frame and closes the handle in its write callback 2. _remove_client(server, client) was called immediately after, which also closed the handle if not is_closing() Since the write callback hadn't fired yet, the handle appeared not to be closing, so both code paths attempted to close it. Changes made: 1. Ping timeout handler (tcp.lua): Removed the call to close_client() for dead clients. Now only calls _remove_client() and on_disconnect() to drop the connection without sending a Close frame. This is correct per RFC 6455, which states code 1006 MUST NOT be sent in a Close frame. 2. close_client() hardening (client.lua): Made fully defensive: - Checks is_closing() before any close operation - Never sends code 1006 in Close frame (RFC 6455 compliance) - Fixed state transitions: sets "closing" before I/O, "closed" after - Guards all close() calls to prevent double-close races 3. _remove_client() hardening (tcp.lua): Added explicit state management and a comment explaining that close() is async but marking "closed" immediately is safe since the client is removed from the active list. Why this implementation is safe: - No code path distinguishes between "closing" and "closed" states - All state checks only care if client.state == "connected" or not - Once state is != "connected", the client won't receive new operations - The is_closing() guards prevent any double-close attempts - Marking state as "closed" immediately after close() is semantically accurate since the client is already removed from server.clients Amp-Thread-ID: https://ampcode.com/threads/T-eb322254-0c96-4330-ae76-53b06d81f3a3 Co-authored-by: Amp <[email protected]>

burmudar · 2025-10-07T15:00:32Z

@bahrmichael forgot about this - will get to it by end of week!

bahrmichael requested a review from burmudar October 2, 2025 15:19

feat: nicer startup message

2f90591

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix double-close race condition when handling dead clients #16

Fix double-close race condition when handling dead clients #16

Uh oh!

bahrmichael commented Oct 2, 2025

Uh oh!

burmudar commented Oct 7, 2025

Uh oh!

Uh oh!

Fix double-close race condition when handling dead clients #16

Are you sure you want to change the base?

Fix double-close race condition when handling dead clients #16

Uh oh!

Conversation

bahrmichael commented Oct 2, 2025

Uh oh!

burmudar commented Oct 7, 2025

Uh oh!

Uh oh!